In this report I explore different cities and attempt to see the differences in rent between major metropolitan areas and their respective suburbs. Rather than answering the questions chronologically due to analyzing California as a whole will lead to bias to certain areas due to the aggregation of statistics in certain areas, I will split the major geographical areas into the Bay, Los Angeles and respective suburbs, and San Diego and its respective suburbs. Then I will provide an analysis of each area with the respect to the questions in Assignment 4. For each area I will look at the clustering of certain data points to see if their is a correlation between location and price of apartments as well as to see if certain areas hold outliers and may have listings which were input incorrectly.

When looking for anomalies in the San Diego region I found an apartment listed that had 1 sqft of space but was being priced at 1645 dollars a month. I found this outlier because in order to find out how expensive a region was I wanted to see price per sqft of an apartment rather than just looking at the price of an apartment. This will better quantify how expensive it is to live in certain neighborhoods based on geographical area, as opposed to just looking at an apartment price, because we are accounting for how big apartments are. Since this outlier will greatly affect our analysis to see if geographical area plays a part in how expensive a region is, I will simply remove this data point. As with another data point in San Diego area, an apartment is listed as having 104544 sqft being rented out for only 675$. We will remove this data point as well.

To see if there is a relationship between apartment size and price of an apartment we look at the plot of apartment size versus price. We took the square root of apartment size because it made the relationship between apartment size and price more linear, thus helping us fit a linear model between size and price.

We see that for every 100 sqft increase we see a (31.9 + 1.2)^2 increase in the price of the apartment to rent per month.

Next we look to see if bedrooms have an affect on the price of the apartment. From our table we see that for every increase in a bedroom, there is a 279 dollar increase in the price estimate of the apartment.

When treating the amount of bedrooms as a categorical variable since there are only a discrete amount of bedrooms we find that as the amount of bedrooms increases there are different impacts on the amount one must pay for a bedroom. In the table below, we illustrate through a linear model that the only nominal difference for the amount of bedrooms on predicting price seems to be the adding the 4th bedroom.

This next table illustrates the increase in bathrooms on the horizontal line generally correleates with a an increase in the amount of bedrooms in an apartment, listed on the vertical line.

These next plots illustrate the difference between when a craigslist advertisement is posted and when the apartment is avaiable for rent in the San Diego Area. From the plot we can tell that most of the postings are almost immeditely ready for rent and then only a couple are avaiable in a month and it is very rare for a post to be posted before 2 months of being avaiable for rent.

To see if there is a correlation between geographical areas and price of apartments we will look at a map of the San Diego and plot on top of the the price per square foot for each apartment.

From this map we see that places next to the ocean, especially in La Jolla and downtown San Diego have a very high price per sqft rent. This is in stark contrast to apartments that tend to be more inland such as in La Cajon where apartments look to relate closer to how large in area they are.

In the Los Angeles and Orange County Region, there exists a studio apartment that is listed for 1$ per month. Obviously this price is off so I will remove it from the data set when exploring the data. Another anomaly are the 4 listings that are each greater than 8000 sqft. These listings each have 5,4 ,2 or 1 bedrooms and 3, 2 or 1 bathrooms, which is signficantly less than we would expect for houses or apartments of this size. Not only this, in one of the listings it states in this states that one could fit a 4 seat dining table, possibly 6 person dining table; which should be easy in an apartment that is 8000+ sqft. Therefore, since each of these points greatly detract from the linearity of our plot of price versus sqft, for ease of predicting I chose to remove these outliers.

From this plot we can see the linear model for predicting price by square feet suggests that every increase in 1.7 square feet increases the price of the apartment per month by 1 dollar.

Each bedroom in an apartment adds approximately 506$ of rent per month in the Los Angeles and Orange County area.

When treating bedrooms as a factor, we see that there is an increasing trend as each bedroom is added, except for the 4th bedroom. This resembles the San Diego data set where the fourth bedroom adds less value than the 3rd bedroom.

To see the relationship between bathrooms and bedrooms and see if bedrooms account for the extra cost of a bathroom we can look at a table of the price for the amount of bathrooms and bedrooms. From this table we see a general trend of price getting more expensive as the amount of bathrooms and bedrooms increase. However; at 4 bedrooms and 4 bathrooms we see a price of 899$ which is significantly less than all of the prices around it.

Similar to the Date differences for the San Diego area, the Los Angeles and Orange County area tend to have a strong majority of open to rent dates close to the initial posting of the advertisement.

Next we look at a map of the Greater Los Angeles Area and see a plot of Apartment Price Per Square Feet. In this plot it is of note that it seems the most expnensive properties per square foot are in downtown Los Angeles and Long Beach. What is also interesting is that the Santa Monica Area, famous for its pier and adjacent to Malibu has relatively low price per square feet. For the most part though, the price per square foot throughout the Los Angeles Area stays uniform around similar geographical areas.

In the San Francisco Bay Area there exist a few outliers which skew the data heavily. One is a listing in Oakland that is listed as 1 swft and for a price of 2073. Obviously this would skew our data and is not an accurate listing so I will remove it from the sample. Another point which I am taking out is a listing to live on a Yacht because it’s not an apartment. There’s also an apartment in San Francisco listed at 450279 Square Feet which is either by far the largest apartment in San Francisco or just wrongly listed so I will be removig it for taking statistical tests.

The linear fit for price versus Square feet in the bay area isn’t great. This is due to geographical area, we can see that in the box plot of prices per square feet for each county there is a large variation in price from county to county. Thus, it makes sense that diferent geographical areas have a heavy influence on the price of apartments, and that when only predicting price using square feet, geographical areas act as a lurking variable.

Though, if we did factor for county, our model actually works quite well,having a R^2 adjusted value of .7. Thus when controlling for county, we can predict that for every increase in square feet, we will see an increase in the price by 1.9 dollars.

Next we look at how much extra bedrooms affect price in the Bay area. According to our linear model it suggests that every added bedroom increases the rent by 650$.

Next we treat bedrooms as a factor and run another linear model.

From this table we see that there is a suttle increase in price as the number of bedrooms increases for one or two bedrooms, but after the apartment has 3 or more bedrooms the rent increases by about 1000$ each time.

In the next table we look at price of apartment as the bedrooms and bathrooms increase. Bathrooms are represented on the y axis while bedrooms are on the x axis. In this table we see that there seems to be an upward trend in price as bother bedroom and bathroom increase. However, there seems to be an anomaly at the 4 bedroom and 4 bathroom apartment. Also confusing are the apartments with zero bedrooms and 4+ bathrooms. From this table we can see that when we used our linear model for predicting the rise in price, we were also partially accounting for an increase in bathrooms since both increase at similar rates in relation to price.

In the next part of the project we look at how quickly an apartment came up for rent after an ad was featured on Craigslist.

Finally for the Bay Area Data we can take a look at the map of the Bay Area with plots of Apartment Price Per Square Foot. From this plot we see that the heart of San Francisco, Palo Alto and Sacramento tend to have the highest prices per sqft of land for apartments. Other places such as San Jose and Santa Rosa and Oakland all tend to have cheaper prices per sqft. This suggests that there is clustering of more expensive apartments based on location.

In order to see if highly populated ares we explore the apartments size of our previous three maps, but rather than mapping for price per square foot we will now be plotting apartment size to see if the metropolitan areas have a smaller apartments than suburbs that are more spread out.

When looking at each of the above plots to see if higher population dense areas tend to have smaller apartments, it appears that the City of San Francisco, which is one of the most population dense areas in California does have a higher proportion of smaller apartments than the greater Bay Area and that of San Diego and Los Angeles.

One question which I had while exploring this data set is why did it seem in the San Diego and Los Angeles area that four bedrooms seemed to be less valuable than a place with three bedrooms.

When looking at the summary of both the 4 bedrooms and 3 bedrooms in Los Angeles, it is of note that the mean price for 3 bedrooms is 300 dollars higher than the mean for the 4 bedrooms. Also the max price of the 3 bedrooms is higher than the max price of the 4 bedrooms. The San Diego prices for 3 bedrooms and 4 bedrooms, both the mean and median are higher for the 3 bedrooms apartments than the 4 bedroom.

The plot of San Diego illustrates that appartments with three bedrooms happen to be in more expensive neighborhoods or cities than those with 4 bedrooms.

Another Question I had about this data set, is what were the data points that had zero bedrooms in the Bay Area and multiple bathrooms. The two listings with 5 bathrooms and the one with 4 bathrooms are as follows.

## [1] Developers, here is the project you have been looking for. Located in a busy area hub with frontage and high visibility on Florin Road. Close to light rail station, schools, shopping, restaurants, and residential areas. Listing includes 4 parcels, 1250-1250-1250, -1250, -1250, -1250; total size is approximately 2.93ac with individual lots varying from .68ac - .74ac. C2, C2-TO zoning. Billboard located on property brings income. Parcels adjoin listing #1250 (2.9ac RMX & RMX-TO zoning). Don't wait!
## 15663 Levels: ________________________________________________________________________________\nOther iPhone & text \n show contact info\n\n\nAvailable July 1st, 2014\n\n*Remodeled unit on the 1st floor\n*2 assigned parking spaces\n*Stainless steel appliances(including dishwasher, fridge/freezer/oven and microwave, Cherry wood cabinets\n*Kitchen has new backsplash and upgraded back patio door\n*Washer and dryer on the patio\n*Complex includes pool which is heated in the summer\n*Two tone wall color\n*Near Highways 5, 805 and 52\n*Walking distance to UCSD, grocery stores (Trader Joes/Ralphs/Whole Foods/etc.), banks, Post Office\n*Next to bus lines, bike route\n*Close to UTC malls, shopping centers, restaurants, night life, nice park and beaches\n*Approximately 1078sf ...
## [1] Amazing Opportunity for Investor or Owner Occupant! Mid-Town Triplex. Upstairs,Victorian 3 bedroom home believed to have been built in 1118 and downstairs apts approx. 1990s. Charm and character with smart upgrades. Spacious upstairs unit approx 1118 sq ft, 3 bdrms, 1.5 ba., Living  and Formal dining area. Two 2/1 units downstairs, functional design. All units have central HVAC & dual pane windows, washer/dryers, downstairs units have tankless water heaters.
## 15663 Levels: ________________________________________________________________________________\nOther iPhone & text \n show contact info\n\n\nAvailable July 1st, 2014\n\n*Remodeled unit on the 1st floor\n*2 assigned parking spaces\n*Stainless steel appliances(including dishwasher, fridge/freezer/oven and microwave, Cherry wood cabinets\n*Kitchen has new backsplash and upgraded back patio door\n*Washer and dryer on the patio\n*Complex includes pool which is heated in the summer\n*Two tone wall color\n*Near Highways 5, 805 and 52\n*Walking distance to UCSD, grocery stores (Trader Joes/Ralphs/Whole Foods/etc.), banks, Post Office\n*Next to bus lines, bike route\n*Close to UTC malls, shopping centers, restaurants, night life, nice park and beaches\n*Approximately 1078sf ...
## [1] 2333 Channing Way, Berkeley, CA 94704\nSTUDIO  apartments available now for June 15 move in.  \n\nWe will be holding an "open house" featuring the available units each day during the weekday afternoons between the hours of 2pm to 5pm.\n\nCheck out this YouTube video of the STUDIO...\nhttps://www.youtube.com/watch?v=m0rAmksR_nQ&nohtml5=False\n\nThis apartment building provides you with an excellent location! It is only two blocks from the UC Berkeley campus and only a few blocks from the Downtown Berkeley BART station and Telegraph! We offer spacious studios which have exquisitely finished hardwood floors, a modern kitchen, and abundant natural lighting for comfortable living. These are well-maintained and organized units with a homey and relaxing appeal.  Our studios include a big closet, large, new dual paned windows, and high ceilings. \n\nThe lease for this property would begin on June 15, 2016 and continue until May 31, 2017.\n\nAmenities & Features:\n>Hardwood Floors\n>Cable/DSL Ready\n>Modern Kitchen & Bath (Refrigerator, stove included)\n>2 blocks to campus\n>1.5 blocks to Telegraph Avenue\n>Secured building entry system\n>On-site coin-operated Laundry available\n>On-site maintenance team available\n>On-site office management team\n>Water & Garbage provided\n>New seismic foundation & fire sprinkler system;\n>Clean, well-run apartment building;\n\nPlease reply back to this craigslist post with the following subject line: "REALLY INTERESTED IN STUDIO" if interested.\nDo it today, because these units won't stay on the market for long.
## 15663 Levels: ________________________________________________________________________________\nOther iPhone & text \n show contact info\n\n\nAvailable July 1st, 2014\n\n*Remodeled unit on the 1st floor\n*2 assigned parking spaces\n*Stainless steel appliances(including dishwasher, fridge/freezer/oven and microwave, Cherry wood cabinets\n*Kitchen has new backsplash and upgraded back patio door\n*Washer and dryer on the patio\n*Complex includes pool which is heated in the summer\n*Two tone wall color\n*Near Highways 5, 805 and 52\n*Walking distance to UCSD, grocery stores (Trader Joes/Ralphs/Whole Foods/etc.), banks, Post Office\n*Next to bus lines, bike route\n*Close to UTC malls, shopping centers, restaurants, night life, nice park and beaches\n*Approximately 1078sf ...

The first listing suggests an opportunity to sell the place to someone who is looking to own property. The second listing is a multi-family house which is wrongly listed as only have three bedrooms and actually contains 3. Interesting though, is that both are from Sacramento and not the actually Bay Area. The listing with only 4 bathrooms and 0 bedrooms is also wrongly listed and is actually a triple apartment. In conclusion there probably doesn’t exist a random studio apartment with multiple bathrooms.

A final question about this data is if there is a higher likelihood for pets to be allowed if the apartment is larger. In order to see this I organized pets into either do or don’t, so that I could create a binomial logit model.

From the generalized linear model, do sqft approximately equalling zero, it does not appear the size of the apartment is a good predictor of whether the apartment is pet friendly or not.

For this report we found that pricing in homes is represented in similar clusters around the same geolocation. For each of the major metropolitan areas throughout California, we can see from map that higher priced homes tend to be located in downtown parts of cities, and or next to beaches. It did not seem though that sqft by itself was a great indicator of pricing of a home. None of the plots were necessarily linear and therefore did not suggest that one affects the other by itself. Instead, when we looked in San Francisco at how the different counties affected the prices, combined with sqft, that the predictor variables became significantly more accurate. While we could create dummy variables to predict price by city, since; depending on Craigslist advertisements, some cities will only have a small amount of postings, we may overfit the model. Therefore, it makes more sense to look at geolocation and see clusters and patterning for how the prices were distributed. Some techniques that could be used for futher statistcal analysis include kerneling techniques and treating counting centers as lattice data and predicting prices by rook techniques.